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Abstract 

This paper studies the computational 
complexity of disambiguation under 
probabilistic tree-grammars as in (Bod, 
1992; Schabes and Waters, 1993). It 
presents a proof that the following prob- 
lems are NP-hard: computing the Most 
Probable Parse from a sentence or from 
a word-graph, and computing the Most 
Probable Sentence (MPS) from a word- 
graph. The NP-hardness of comput- 
ing the MPS from a word-graph also 
holds for Stochastic Context-Free Gram- 
mars (SCFGs). 

1 Motivation 

Statistical disambiguation is currently a pop- 
ular technique in parsing Natural Language. 
Among the models that implement statistical 
disambiguation one finds the models that em- 
ploy Tree-Grammars such as Data Oriented 
Parsing (DOP) (Scha, 1990; Bod, 1992) and 
Stochastic (Lexicalized) Tree- Adjoining Grammar 
(STAG) (Schabes and Waters, 1993). These mod- 
els extend the domain of locality for expressing 
constraints from simple Context-Free Grammar 
(CFG) productions to deeper structures called 
elementary-trees. Due to this extension, the one 
to one mapping between a derivation and a parse- 
tree, which holds in CFGs, does not hold any 
more; many derivations might generate the same 
parse-tree. This seemingly spurious ambiguity 
turns out crucial for statistical disambiguation as 
defined in (Bod, 1992) and in (Schabes and Wa- 
ters, 1993), where the derivations are considered 
different stochastic processes and their probabili- 
ties all contribute to the probability of the gener- 
ated parse. Therefore the Most Probable Deriva- 
tion (MPD) does not necessarily generate the 
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Most Probable Parse (MPP). 

The problem of computing the MPP in the 
DOP framework was put forward in (Bod, 1995). 
The solution which Bod proposes is Monte-Carlo 
estimation (Bod, 1993), which is essentially re- 
peated random-sampling for minimizing error- 
rate. A Viterbi-style optimization for computing 
the MPP under DOP is presented in (Sima'an et 
al., 1994), but it does not guarantee determin- 
istic polynomial-time complexity. In this paper 
we present a proof that computing the MPP un- 
der the above mentioned stochastic tree gram- 
mars is NP-hard. Note that for computing the 
MPD there are deterministic polynomial-time al- 
gorithms (Schabes and Waters, 1993; Sima'an, 
1996) 1 . Another probl em that turns out also NP- 
hard is computing the Most Probable Sentence 
(MPS) from a given word-graph. But this prob- 
lem turns out NP-hard even for SCFGs. 

Beside the mathematical interest, this work is 
driven by the desire to develop efficient algorithms 
for these problems. Such algorithms can be useful 
for various applications that demand robust and 
faithful disambiguation e.g. Speech Recognition, 
Information Retrieval. This proof provides an ex- 
planation for the source of complexity, and forms 
a license to redirect the research for solutions to- 
wards non-standard optimizations. 

The structure of the paper is as follows. Sec- 
tion 2 briefly discusses the preliminaries. Section 3 
presents the proofs. Section 4 discusses this result, 
points to the source of complexity and suggests 
some possible solutions. The presentation is for- 
mal only where it seemed necessary. 

2 Preliminaries 

2.1 Stochastic Tree-Substitution 
Grammar (STSG) 

STSGs and SCFGs are closely related. STSGs 
and SCFGs are equal in weak generative ca- 

x The author notes that the actual accuracy figures 
of the experiments listed in (Sima'an, 1995) are much 
higher than the accuracy figures reported in the paper. 
The lower figures reported in that paper are due to a 
test-procedure. 



pacity (i.e. string languages). This is not the 
case for strong generative capacity (i.e. tree lan- 
guages); STSGs can generate tree-languages that 
are not generatable by SCFGs. An STSG 
is a five-tuple (Vjv, Vr, S, C, PT), where Vm and 
Vt denote respectively the finite set of non- 
terminal and terminal symbols, S denotes the 
start non-terminal, C is a finite set of elementary- 
trees (of arbitrary depth > 1) and PT is a function 
which assigns a value < PT(t) < 1 (proba- 
bility) to each elementary-tree t such that for all 
N eV N : £ t6C , root{t)=N PT(t) = 1 (where root(t) 
denotes the root of tree t). An elementary-tree 
in C has only non-terminals as internal nodes but 
may have both terminals and non-terminals on its 
frontier. A non-terminal on the frontier is called 
an Open- Tree (OT). If the left-most open-tree 
N of tree t is equal to the root of tree tl then 
totl denotes the tree obtained by substituting tl 
for N in t. The partial function o is called left- 
most substitution. A left-most derivation 
(l.m.d.) is a sequence of left-most substitutions 
Imd = (. . . (ii oi 2 ) o . . .) ot n , where t±, . . . , t n £ C , 
root(ti) = S and the frontier of Imd consists of 
only terminals. The probability P(lmd) is de- 
fined as PT(ii) x ... x PT(t n ). For convenience, 
derivation in the sequel refers to l.m. derivation. 
A Parse is a tree generated by a derivation. A 
parse is possibly generatable by many derivations. 
The probability of a parse is defined as the sum of 
the probabilities of the derivations that generate 
it. The probability of a sentence is the sum of the 
probabilities of all derivations that generate that 
sentence. 

A word-graph over the alphabet Q is Qi x 
• • • x Q m , where Qi C Q, for all 1 < i < m. We 
denote this word-graph with Q m if Qi = Q, for 
all 1 < i < m. 

2.2 The 3SAT problem 

It is sufficient to prove that a problem is NP-hard 
in order to prove that it is intractable. A problem 
is NP-hard if it is (at least) as hard as any problem 
that has been proved to be NP-complete (i.e. a 
problem that is known to be decidable on a non- 
deterministic Turing Machine in polynomial-time 
but not known to be decidable on a deterministic 
Turing Machine in polynomial-time). To prove 
that problem A is as hard as problem B, one shows 
a reduction from problem B to problem A. The 
reduction must be a deterministic polynomial time 
transformation that preserves answers. 

The NP-complete problem which forms our 
starting-point is the 3SAT (satisfiability) problem. 
An instance INS of 3SAT can be stated as follows 2 : 

Given an arbitrary 3 Boolean formula in 
3-conjunctive normal form (3CNF) over 

2 In the sequel, INS, INS's formula and its symbols 
refer to this particular instance of 3SAT. 

3 Without loss of generality we assume that the for- 



the variables u±, . . . , u n . Is there an as- 
signment of values true or false to the 
Boolean variables such that the given 
formula is true ? Let us denote the given 
formula by C\ A C 2 A • • • A C m for m > 1 
where Ci represents (dn V di2 V 0^3), 
for 1 < i < m, 1 < j < 3, and 
dij represents a literal Uf. or Uf. for some 
1 < k < n. 

Optimization problems are known to be 
(at least) as hard as their decision counter- 
parts (Garey and Johnson, 1981). The deci- 
sion problem related to maximizing a quantity M 
which is a function of a variable V can be stated 
as follows: is there a value for V that makes the 
quantity M greater than or equal to a predeter- 
mined value m. The decision problems related to 
disambiguation under DOP can be stated as fol- 
lows, where G is an STSG, WG is a word-graph, 
Wq is a sentence and < p < 1: 

MPPWG Does the word-graph WG have any 
parse, generatable by the STSG G, that has 
probability value greater than or equal to p ? 

MPS Does the word-graph WG contain any sen- 
tence, generatable by the STSG G, that has 
probability value greater than or equal to p ? 

MPP Does the sentence w™ have a parse gener- 
atable by the STSG G, that has probability 
value greater than or equal to p ? 

Note that in the sequel MPPWG / MPS / MPP 
denotes the decision problem corresponding to the 
problem of computing the MPP / MPS / MPP 
from a word-graph / word-graph / sentence re- 
spectively. 

3 Complexity of MPPWG, MPS 
and MPP 

3.1 3SAT to MPPWG and MPS 

The reduction from the 3SAT instance INS to 
an MPPWG problem must construct an STSG 
and a word-graph in deterministic polynomial- 
time. Moreover, the answers to the MPPWG 
instance must correspond exactly to the an- 
swers to INS. The presentation of the reduc- 
tion shall be accompanied by an example of the 
following 3SAT instance (Barton et al., 1987): 
(wi V u 2 V u 3 ) A (wi V u 2 V u 3 ). Note that a 3SAT 
instance is satisfiable iff at least one of the liter- 
als in each conjunct is assigned the value True. 
Implicit in this, but crucial, the different occur- 
rences of the literals of the same variable must be 
assigned values consistently. 

Reduction: The reduction constructs an STSG 
and a word-graph. The STSG has start-symbol 
S, two terminals represented by T and F , non- 
terminals which include (beside S) all Ck, for 

mula does not contain repetition of conjuncts. 
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Figure 1: The elementary-trees for the example 3SAT instance 



1 < k < m, and both literals of each Boolean vari- 
able of the formula of INS. The set of elementary- 
trees and probability function and the word-graph 
are constructed as follows: 

1. For each Boolean variable Ui, 1 < i < n, 
construct two elementary-trees that corre- 
spond to assigning the values true and false 
to Ui consistently through the whole formula. 
Each of these elementary-trees has root S, 
with children Cfc, 1 < k < m, in the 
same order as these appear in the formula 
of INS; subsequently the children of Ck are 
the non-terminals that correspond to its three 
disjuncts dki, dk2 and dk3- And finally, the 
assignment of true (false) to Ui is modeled by 
creating a child terminal T (resp. F ) to each 
non-terminal Ui and F (resp. T ) to each 
Ui. The two elementary-trees for u±, of our 
example, are shown in the top left corner of 
figure 1. 

2. The reduction constructs three elementary- 
trees for each conjunct Ck- The three 
elementary-trees for conjunct Ck have the 
same internal structure: root Ck, with 
three children that correspond to the dis- 
juncts dki, dk2 and 0^3. In each of these 



elementary-trees exactly one of the disjuncts 
has as a child the terminal T ; in each of 
them this is a different one. Each of these 
elementary-trees corresponds to the conjunct 
where one of the three possible literals is as- 
signed the value T . For the elementary-trees 
of our example see the top right corner of fig- 
ure 1. 

3. The reduction constructs for each of the two 
literals of each variable Ui two elementary- 
trees where the literal is assigned in one case 
T and in the other F . Figure 1 shows these 
elementary-trees for variable ui in the bottom 
left corner. 

4. The reduction constructs one elementary- 
tree that has root S with children Ck, 
1 < k < m, in the same order as these 
appear in the formula of INS (see the bottom 
right corner of figure 1). 

5. The probabilities of the elementary-trees that 
have the same root non-terminal sum up to 1. 
The probability of an elementary-tree with 
root S that was constructed in step 1 of this 
reduction is a value pi, 1 < i < n, where 
Ui is the only variable of which the literals 
in the elementary-tree at hand are lexical- 



ized (i.e. have terminal children). Let rii de- 
note the number of occurrences of both liter- 
als of variable Ui in the formula of INS. Then 
Pi = 6 (|) ni , for some real 6 that has to fulfill 
some conditions which will be derived next. 
The probability of the tree rooted with S and 
constructed at step 4 of this reduction must 
then be po = [1 — 2^™ =1 pi]. The proba- 
bility of the elementary-trees of root C\ (step 
2) is (|), and of root ui or ui (step 3) is (|). 
For our example some suitable probabilities 
are shown in figure 1. 

6. Let Q denote a threshold probability that 
shall be derived hereunder. The MPPWG 
(MPS) instance is: does the STSG generate 
a parse (resp. sentence) of probability > Q, 
for the word-graph WG = {T, F} 3m ? 

Deriving the probabilities: The parses gen- 
erated by the constructed STSG differ only in the 
sentences on their frontiers. Therefore, if a sen- 
tence is generated by this STSG then it has ex- 
actly one parse. This justifies the choice to reduce 
3SAT to MPPWG and MPS simultaneously. 

One can recognize two types of derivations in 
this STSG. The first type corresponds to substi- 
tuting for an open-tree (i.e literal) of any of the 
2n elementary-trees constructed in step 1 of the 
reduction. This type of derivation corresponds to 
assigning values to all literals of some variable Ui 
in a consistent manner. For all 1 < i < n the prob- 
ability of a derivation of this type is 
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The second type of derivation corresponds to 
substituting the elementary-trees rooted with C\ 
in S — >■ Ci . . . C m , and subsequently substituting 
in the open-trees that correspond to literals. This 
type of derivation corresponds to assigning to at 
least one literal in each conjunct the value true. 
The probability of any such derivation is 



i = l 



Now we derive both the threshold Q and the 
parameter 6. Any parse (or sentence) that ful- 
fills both the "consistency of assignment" require- 
ments and the requirement that each conjunct has 
at least one literal with child T , must be gen- 
erated by n derivations of the first type and at 
least one derivation of the second type. Note that 
a parse can never be generated by more than n 
derivations of the first type. Thus the threshold 
Q is: 

Q = ne{\f m + [i-2eY J {\r]{\f m {\r 

i = l 



1. For all i: < pi < 1. This means that for 
1 < i < n: < 6{\) ni < 1, and 
< Po < 1- However, the last requirement on 
po implies that < 20£)" =1 (§) ni < 1, 
which is a stronger requirement than 
the other n requirements. This re- 
quirement can also be stated as follows: 

0<6< 

2. Since we want to be able to know whether a 
parse is generated by a second type deriva- 
tion only by looking at the probability of the 
parse, the probability of a second type deriva- 
tion must be distinguishable from first type 
derivations. Moreover, if a parse is generated 
by more than one derivation of the second 
type, we do not want the sum of the prob- 
abilities of these derivations to be mistaken 
for one (or more) first type derivation(s). 
For any parse, there are at most 3 m second 
type derivations (e.g. the sentence T . . .T ). 
Therefore we require that: 

i=i 

Which is equal to 6 > 



3. For the resulting STSG to be a probabilis- 
tic model, the "probabilities" of parses and 
sentences must be in the interval (0, 1]. This 
is taken care of by demanding that the sum 
of the probabilities of elementary-trees that 
have the same root non-terminal is 1, and 
by the definition of the derivation's probabil- 
ity, the parse's probability, and the sentence's 
probability. 

There exists a 6 that fulfills all these requirements 
because the lower bound ^„ — , ' 1 is 

always larger than zero and is strictly smaller than 
the upper bound v->" Vi'' ' 

Polynomiality of the reduction: This reduc- 
tion is deterministic polynomial-time in n because 
it constructs not more than 2n + 1 + 3m + 4n 
elementary-trees of maximum number of nodes 4 
7m + 1. 

The reduction preserves answers: The 

proof concerns the only two possible answers. 

Yes If INS's answer is Yes then there is an as- 
signment to the variables that is consistent 
and where each conjunct has at least one lit- 
eral assigned true. Any possible assignment 
is represented by one sentence in WG. A 
sentence which corresponds to a "successful" 
assignment must be generated by n deriva- 
tions of the first type and at least one deriva- 
tion of the second type; this is because the 



However, 6 must fulfill some requirements for our 
reduction to be acceptable: 



4 Note than m is polynomial in n because the for- 
mula does not contain two identical conjuncts. 



sentence w\ m fulfills n consistency require- 
ments (one per Boolean variable) and has at 
least one T as w 3k+1 , w 3k+2 or w 3k+3 , for all 
< k < m. Both this sentence and its 
corresponding parse have probability > Q. 
Thus MPPWG and MPS also answer Yes. 

No If INS's answer is No, then all possible assign- 
ments are either not consistent or result in at 
least one conjunct with three false disjuncts, 
or both. The sentences (parses) that cor- 
respond to non-consistent assignments each 
have a probability that cannot result in a Yes 
answer. This is the case because such sen- 
tences have fewer than n derivations of the 
first type, and the derivations of the second 
type can never compensate for that (the re- 
quirements on 6 take care of this). For the 
sentences (parses) that correspond to con- 
sistent assignments, there is at least some 
< k < m such that w 3k+ i, w 3k+ 2 and 
w 3k + 3 are all F . These sentences do not have 
second type derivations. Thus, there is no 
sentence (parse) that has a probability that 
can result in a Yes answer; the answer of MP- 
PWG and MPS is NO. 

We conclude that MPPWG and MPS are both 
NP-hard problems. 

Now we show that MPPWG and MPS are in 
NP. A problem is in NP if it is decidable by 
a non-deterministic Turing machine. The proof 
here is informal: we show a non-deterministic al- 
gorithm that keeps proposing solutions and then 
checking each of them in deterministic polyno- 
mial time cf. (Barton et al., 1987). If one solu- 
tion is successful then the answer is Yes. One 
possible non-deterministic algorithm for the MP- 
PWG and MPS, constructs firstly a parse-forest 
for WG in deterministic polynomial time based 
on the algorithms in (Schabes and Waters, 1993; 
Sima'an, 1996), and subsequently traverses this 
parse-forest (bottom-up for example) deciding at 
each point what path to take. Upon reaching 
the start non-terminal S, it retrieves the sen- 
tence (parse) and evaluates it in deterministic 
polynomial-time (Sima'an et al., 1994), thereby 
answering the decision problem. 

This concludes the proof that MPPWG and 
MPS are both NP-complete. 

3.2 NP-completeness of MPP 

The NP-completeness of MPP can be easily de- 
duced from the previous section. In the reduction 
the terminals of the constructed STSG are new 
symbols Vij, 1 < i < m and 1 < j < 3, 
instead of T and F that become non-terminals. 
Each of the elementary-trees with root S or C k 
is also represented here but each T and F on 
the frontier has a child v k j wherever the T or 
F appears as the child of the jth. child (a lit- 
eral) of C k . For each elementary-tree with root 



Ui or Ui, there are 3m elementary-trees in the new 
STSG that correspond each to creating a child 
Vij for the T or F on its frontier. The proba- 
bility of an elementary-tree rooted by a literal is 
g^j. The probabilities of elementary-trees rooted 
with C k do not change. And the probabilities of 
the elementary-trees rooted with S are adapted 
from the previous reduction by substituting for 
every (|) the value The threshold Q and the 
requirements on 6 are also updated accordingly. 
The input sentence which the reduction constructs 
is simply »n . . . v 3m . The decision problem is 
whether there is a parse generated by the resulting 
STSG for this sentence that has probability larger 
than or equal to Q. 

The rest of the proof is very similar to that in 
section 3. Therefore the decision problem MPP is 
NP-complete. 

3.3 MPS under SCFG 

The decision problem MPS is NP-complete also 
under SCFG. The proof is easily deducible from 
the proof concerning MPS for STSGs. The reduc- 
tion simply takes the elementary-trees of the MPS 
for STSGs and removes their internal structure, 
thereby obtaining simple CFG productions. Cru- 
cially, each elementary-tree results in one unique 
CFG production. The probabilities are kept the 
same. The word-graph is also the same word- 
graph as in MPS for STSGs. The problem is: 
does the SCFG generate a sentence with probabil- 
ity > Q, for the word-graph WG = {T, F} 3m . 
The rest of the proof follows directly from sec- 
tion 3. 

4 Conclusion and discussion 

We conclude that computing the MPP / MPS / 
MPP from a sentence / word-graph / word-graph 
respectively is NP-hard under DOP. Computing 
the MPS from a word-graph is NP-hard even un- 
der SCFGs. Moreover, these results are applicable 
to STAG as in (Schabes and Waters, 1993). 

The proof of the previous section helps in un- 
derstanding why computing the MPP in DOP is 
such a hard problem. The fact that MPS under 
SCFG is also NP-hard implies that the complex- 
ity of the MPPWG, MPS and MPP is due to the 
definitions of the probabilistic model rather than 
the complexity of the syntactic model. 

The main source of NP-completeness is the fol- 
lowing common structure of these problems: they 
all search for an entity that maximizes the sum 
of the probabilities of processes which depend on 
that entity. For the MPS problem of SCFGs for 
example, one searches for the sentence which max- 
imizes the sum of the probabilities of the parses 
that generate that sentence (i.e. the probability 
of a parse is also a function of whether it gener- 
ates the sentence at hand or not). This is not the 



case, for example, when computing the MPD un- 
der STSGs (for sentence or even a word-graph), 
or when computing the MPP under SCFGs (for a 
sentence or a word-graph). 

The proof in this paper is not a mere theoretical 
issue. An exponential algorithm can be compara- 
ble to a deterministic polynomial algorithm if the 
grammar-size can be neglected and if the expo- 
nential formula is not much worse than the poly- 
nomial for realistic sentence lengths. But as soon 
as the grammar size becomes an important factor 
(e.g. in DOP), polynomiality becomes a very de- 
sirable quality. For example \G\ e n and \G\ n 3 for 
n < 7 are comparable but for n = 12 the poly- 
nomial is some 94 times faster. If the grammar 
size is small and the comparison is between 0.001 
seconds and 0.1 seconds this might be of no prac- 
tical importance. But when the grammar size is 
large and the comparison is between 60 seconds 5 
and 5640 seconds for a sentence of length 12, then 
things become different. 

To compute the MPP under DOP, one possi- 
ble solution involves some heuristic that directs 
the search towards the MPP; a form of this strat- 
egy is the Monte-Carlo technique. Another so- 
lution might involve assuming Memory-based be- 
havior in directing the search towards the most 
"suitable" parse according to some heuristic eval- 
uation function that is inferred from the proba- 
bilistic model. And a third possible solution is to 
adjust the probabilities of elementary-trees such 
that it is not necessary to compute the MPP. The 
probability of an elementary-tree can be redefined 
as the sum of the probabilities of all derivations 
that generate it in the given STSG. This redefi- 
nition can be applied by off-line computation and 
normalization. Then the probability of a parse is 
redefined as the probability of the MPD that gen- 
erates it, thereby collapsing the MPP and MPD. 
This method assumes full independence beyond 
the borders of elementary-trees, which might be 
an acceptable assumption. 

Finally, it is worth noting that the solutions 
that we suggested above are merely algorithmic. 
But the ultimate solution to the complexity of 
probabilistic disambiguation under the current 
models lies, we believe, only in further incorpo- 
ration of the crucial elements of the human pro- 
cessing ability into these models. 
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